Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing

نویسندگان

  • David Oleson
  • Alexander Sorokin
  • Greg P. Laughlin
  • Vaughn Hester
  • John Le
  • Lukas Biewald
چکیده

Crowdsourcing is an effective tool for scalable data annotation in both research and enterprise contexts. Due to crowdsourcing’s open participation model, quality assurance is critical to the success of any project. Present methods rely on EM-style post-processing or manual annotation of large gold standard sets. In this paper we present an automated quality assurance process that is inexpensive and scalable. Our novel process relies on programmatic gold creation to provide targeted training feedback to workers and to prevent common scamming scenarios. We find that it decreases the amount of manual work required to manage crowdsourced labor while improving the overall quality of the results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Worker Perception of Quality Assurance Mechanisms in Crowdsourcing and Human Computation Markets

Many human computation systems utilize crowdsourcing marketplaces to recruit workers. Because of the open nature of these marketplaces, requesters need to use appropriate quality assurance mechanisms to guarantee high quality results. Previous research has mostly focused on the statistical aspects of quality assurance. Instead, we analyze the worker perception of five quality assurance mechanis...

متن کامل

Effective Quality Assurance for Data Labels through Crowdsourcing and Domain Expert Collaboration

Researchers and scientists have been using crowdsourcing platforms to collect labeled training data in recent years. The process is cost-effective and scalable, but research has shown that the quality of truth inference is unstable due to worker bias, work variance, and task difficulty. In this demonstration, we present a hybrid system, named IDLE (Integrated Data Labeling Engine), that brings ...

متن کامل

Mentor: A Visualization and Quality Assurance Framework for Crowd-Sourced Data Generation

Crowdsourcing is a feasible method for collecting labeled datasets for training and evaluating machine learning models. Compared to the expensive process of generating labeled datasets using dedicated trained judges, the low cost of data generation in crowdsourcing environments enables researchers and practitioners to collect significantly larger amounts of data for the same cost. However, crow...

متن کامل

Behavior-Based Quality Assurance in Crowdsourcing Markets

Quality assurance in crowdsourcing markets has appeared to be an acute problem over the last years. We propose a quality control method inspired by Statistical Process Control (SPC), commonly used to control output quality in production processes and characterized by relying on time-series data. Behavioral traces of users may play a key role in evaluating the performance of work done on crowdso...

متن کامل

Ontology Quality Assurance with the Crowd

The Semantic Web has the potential to change the Web as we know it. However, the community faces a significant challenge in managing, aggregating, and curating the massive amount of data and knowledge. Human computation is only beginning to serve an essential role in the curation of these Web-based data. Ontologies, which facilitate data integration and search, serve as a central component of t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011